1,282 research outputs found

    Génération aléatoire uniforme de mots de langages rationnels

    Get PDF
    RésuméNous donnons deux algorithmes de génération aléatoire et uniforme de mots, qui s'appliquent à des classes particuliÚres de langages rationnels. Leur efficacité est mesurée en termes de complexité logarithmique, en fonction de la longueur n des mots engendrés. Le premier algorithme est dédié aux langages dont les séries génératrices possÚdent un unique pÎle, éventuellement multiple; sa complexité en temps est de l'ordre de n log n, et l'espace mémoire occupé est en log n. Le second algorithme est réservé aux langages dont les séries génératrices possÚdent la propriété suivante: il existe un unique pÎle de plus petit module, et ce pÎle est simple. AprÚs un pré-traitement en temps polynomial en n, le tirage aléatoire de tout mot s'effectue en temps moyen et espace linéaires.AbstractThe problem of generating uniformly at random words of a given language has been the subject of extensive study in the last few years. An important part of that work is devoted to the generation of words of context-free languages (see, e.g., [6, 8, 9, 12]). For a given integer n > 0, the words of length n > 0 of any unambiguous context-free language can be generated uniformly at random by using algorithms derived from the general method which was introduced by Wilf [14, 15] and systematized by Flajolet et al. [7]. Clearly, this can be applied to the set of rational languages, which constitute an important special case of context-free languages.Most authors use the uniform measure of complexity (see [1]) in order to compute the complexity of the algorithms of generation. This measure is based on the following hypotheses: any simple arithmetic operation (addition, multiplication) has time cost 0(1), and a constant amount of memory space is taken by any number. Thus, we know that words of any rational language can be generated by using an algorithm which, with respect to the uniform measure of complexity, runs in linear time (in terms of the length of the words) and constant space [9]. This measure is realistic only if there is a reasonable bound on the numbers involved in the operations. However, the classical random generation algorithms involve operations on numbers which grow exponentially in terms of the length of the words to be generated. Moreover, the programs which make use of these algorithms are generally used to generate very large words, for example for the purpose of studying the asymptotic behavior of some parameters. Therefore, the uniform measure does not reflect the real behavior of such programs. It turns out that the logarithmic measure of complexity is much more realistic: one assumes that the space taken by a number k is O(log k), and that any simple arithmetic operation can be done in time O(log k). It is with respect to this measure that we will evaluate the performance of algorithms in this paper.Our goal is to design efficient algorithms (in terms of logarithmic complexity) to generate uniformly at random words from certain classes of rational languages. We consider rational languages defined by their minimal finite deterministic automata. When computing complexity, neither the size of the automaton nor the cardinality of the alphabet are taken in account.In Section 2 we present some background on rational languages and their generating series. We describe briefly the classical method for generating words of such languages and we study its logarithmic complexity. We show that it is at best quadratic for most languages. This is due mainly to computations on numbers which grow exponentially with the length of the words to be generated. In order to improve significantly the efficiency of the algorithms, we must avoid handling of large numbers, or at least decrease substantially the frequency of computations on such numbers. Another alternative, briefly discussed in [7] and [12], is to compute with floating point numbers instead of integers. In this case, the logarithmic complexity is time-linear. However, using floating point numbers leads inevitably to approximations which prevent the exact uniformity of the generation.In Sections 3 and 4 we show that, in some cases, we can avoid computations on large numbers entirely or almost entirely, while keeping the exact uniformity of the generation. We determine two classes of rational languages for which this is the case.Section 3 concerns languages whose associated generating series have a unique singularity. We present a simple version of the classical algorithm, which totally avoids handling of large numbers. The logarithmic complexity of the method is O(n log n) in time and O(log n) in memory space.Section 4 focuses on languages whose associated generating series have the following property: there exists a unique singularity of minimum modulus, and this singularity is simple. For such languages we give a probabilistic version of the classical algorithm which generates words randomly while avoiding most computations on large numbers. This method needs a preprocessing stage, which can be done in polynomial time and linear space in terms of the length n of the words. Following preprocessing, any word of length n can be generated in average linear time and space

    Tree decomposition and parameterized algorithms for RNA structure-sequence alignment including tertiary interactions and pseudoknots

    Get PDF
    We present a general setting for structure-sequence comparison in a large class of RNA structures that unifies and generalizes a number of recent works on specific families on structures. Our approach is based on tree decomposition of structures and gives rises to a general parameterized algorithm, where the exponential part of the complexity depends on the family of structures. For each of the previously studied families, our algorithm has the same complexity as the specific algorithm that had been given before.Comment: (2012

    Flexible RNA design under structure and sequence constraints using formal languages

    Get PDF
    The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics (2013

    Average complexity of the Jiang-Wang-Zhang pairwise tree alignment algorithm and of a RNA secondary structure alignment algorithm

    Get PDF
    International audienceWe prove that the average complexity of the pairwise ordered tree alignment algo- rithm of Jiang, Wang and Zhang is in O(nm), where n and m stand for the sizes of the two trees, respectively. We show that the same result holds for the aver- age complexity of pairwise comparison of RNA secondary structures, using a set of biologically relevant operations

    VARNA: Interactive drawing and editing of the RNA secondary structure.

    Get PDF
    International audienceDESCRIPTION: VARNA is a tool for the automated drawing, visualization and annotation of the secondary structure of RNA, designed as a companion software for web servers and databases. FEATURES: VARNA implements four drawing algorithms, supports input/output using the classic formats dbn, ct, bpseq and RNAML and exports the drawing as five picture formats, either pixel-based (JPEG, PNG) or vector-based (SVG, EPS and XFIG). It also allows manual modification and structural annotation of the resulting drawing using either an interactive point and click approach, within a web server or through command-line arguments. AVAILABILITY: VARNA is a free software, released under the terms of the GPLv3.0 license and available at http://varna.lri.fr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Generating functions for generating trees

    Get PDF
    Certain families of combinatorial objects admit recursive descriptions in terms of generating trees: each node of the tree corresponds to an object, and the branch leading to the node encodes the choices made in the construction of the object. Generating trees lead to a fast computation of enumeration sequences (sometimes, to explicit formulae as well) and provide efficient random generation algorithms. We investigate the links between the structural properties of the rewriting rules defining such trees and the rationality, algebraicity, or transcendence of the corresponding generating function.Comment: This article corresponds, up to minor typo corrections, to the article submitted to Discrete Mathematics (Elsevier) in Nov. 1999, and published in its vol. 246(1-3), March 2002, pp. 29-5

    Uniform Random Sampling of Traces in Very Large Models

    Get PDF
    This paper presents some first results on how to perform uniform random walks (where every trace has the same probability to occur) in very large models. The models considered here are described in a succinct way as a set of communicating reactive modules. The method relies upon techniques for counting and drawing uniformly at random words in regular languages. Each module is considered as an automaton defining such a language. It is shown how it is possible to combine local uniform drawings of traces, and to obtain some global uniform random sampling, without construction of the global model

    A new dichotomic algorithm for the uniform random generation of words in regular languages (journal version)

    Get PDF
    International audienceWe present a new algorithm for generating uniformly at random words of any regular language L\mathcal{L}. When using floating point arithmetics, its bit-complexity is O(qlog⁥2n)\mathcal{O}(q \log^2 n) in space and O(qnlog⁥2n)\mathcal{O}(q n \log^2 n) in time, where nn stands for the length of the word, and qq stands for the number of states of a finite deterministic automaton of L\mathcal{L}. We implemented the algorithm and compared its behavior to the state-of-the-art algorithms, on a set of large automata from the VLTS benchmark suite. Both theoretical and experimental results show that our algorithm offers an excellent compromise in terms of space and time requirements, compared to the known best alternatives. In particular, it is the only method that can generate long paths in large automata

    Counting RNA pseudoknotted structures

    Get PDF
    International audienceIn 2004, Condon and coauthors gave a hierarchical classification of exact RNA structure prediction algorithms according to the generality of structure classes that they handle. We complete this classification by adding two recent prediction algo- rithms. More importantly, we precisely quantify the hierarchy by giving closed or asymptotic formulas for the theoretical number of structures of given size n in all the classes but one. This allows to assess the tradeoff between the expressiveness and the computational complexity of RNA structure prediction algorithms

    Homology ­modeling of complex structural RNAs

    Get PDF
    National audienceAligning macromolecules such as proteins, DNAs and RNAs in order to reveal, or conversely exploit, their functional homology is a classic challenge in bioinformatics, with far­reaching applications in structure modelling and genome annotations. In the specific context of complex RNAs, featuring pseudoknots, multiple interactions and non­canonical base pairs, multiple algorithmic solutions and tools have been proposed for the structure/sequence alignment problem. However, such tools are seldom used in practice, due in part to their extreme computational demands, and because of their inability to support general types of structures. Recently, a general parameterized algorithm based on tree decomposition of the query structure has been designed by Rinaudo et al. We present an implementation of the algorithm within a tool named LiCoRNA. We compare it against state­of­the­art algorithms. We show that it both gracefully specializes into a practical algorithm for simple classes pseudoknot, and offers a general solution for complex pseudoknots, which are explicitly out­of­reach of competing softwares
    • 

    corecore